A Segment-based Weighting Technique for URL-based Genre Classification of Web Pages

Authors: Chaker Jebari

Polibits, Vol. 53, pp. 43-48, 2016.

Abstract: We propose a segment-based weighting technique for genre classification of web pages. This technique exploits character n-grams extracted from the URL of the web page rather than its textual content. The main idea of our technique is to segment the URL and assigns a weight for each segment. Experiments conducted on three known genre datasets show that our method achieves encouraging results.

Keywords: URL, genre classification, web page, segment weight

PDF: A Segment-based Weighting Technique for URL-based Genre Classification of Web Pages
PDF: A Segment-based Weighting Technique for URL-based Genre Classification of Web Pages

http://dx.doi.org/10.17562/PB-53-4

 

Table of contents of Polibits 53